To solve the emotional differences between different regions of the video frame and make use of the interrelationship between different regions, a region dual attention-based video emotion recognition method (RDAM) is proposed. RDAM takes as input video frame sequences and learns a discriminatory video emotion representation that can make full use of the emotional differences of different regions and the interrelationship between regions. Specifically, we construct two parallel attention modules: one is the regional location attention module, which generates a weight value for each feature region to identify the relative importance of different regions. Based on the weight, the emotion feature that can perceive the emotional sensitive region is generated. The other is the regional relationship attention module, which generates a region relation matrix that represents the interrelationship of different regions of a video frame. Based on the region relation matrix, the emotion feature that can perceive interrelationship between different regions is generated. The outputs of these two attention modules are fused to produce the emotional features of video frames. Then, the features of video frame sequences are fused by attention-based fusion network, and the final emotion feature of the video is produced. The experimental results on the video emotion recognition data sets show that the proposed method outperforms the other related works.
Loading....